Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors

نویسندگان

  • Ravi R. Iyer
  • Laxmi N. Bhuyan
چکیده

ÐCache coherent nonuniform memory access (CC-NUMA) multiprocessors provide a scalable design for shared memory. But, they continue to suffer from large remote memory access latencies due to comparatively slow memory technology and large data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory access performance of CC-NUMA multiprocessors. The main idea is to implement small fast caches in crossbar switches of the interconnect medium to capture and store shared data as they flow from the memory module to the requesting processor. This stored data acts as a cache for subsequent requests, thus reducing the need for remote memory accesses tremendously. The implementation of a cache in a crossbar switch needs to be efficient and robust, yet flexible for changes in the caching protocol. The design and implementation details of a CAche Embedded Switch ARchitecture, CAESAR, using wormhole routing with virtual channels is presented. We explore the design space of switch caches by modeling CAESAR in a detailed execution driven simulator and analyze the performance benefits. Our results show that the CAESAR switch cache is capable of improving the performance of CC-NUMA multiprocessors by up to 45 percent reduction in remote memory accesses for some applications. By serving remote read requests at various stages in the interconnect, we observe improvements in execution time as high as 20 percent for these applications. We conclude that switch caches provide a cost-effective solution for designing high performance CC-NUMA

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of a Switch

Cache coherent non-uniform memory access (CC-NUMA) multiprocessors provide a scal-able design for shared memory but they continue to suuer from large remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the in-terconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory...

متن کامل

Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors

Cache coherentnon-uniform memory access (CC-NUMA) multiprocessors continue to suffer from remote memory access latencies due to comparatively slow memory technology and data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache. The main idea is to implement small fast caches in crossbar switches of the interconnect ...

متن کامل

Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors

In this paper, we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implement small fast directory caches in crossbar switches of the interconnect medium to capture and store ownership information as the data flows from the memory module to the requesting processor. Using the stored informatio...

متن کامل

Switch MSHR: A Technique to Reduce Remote Read Memory Access Time in CC-NUMA Multiprocessors

A remote memory access poses a severe problem for the design of CC-NUMA multiprocessors because it takes an order of magnitude longer than the local memory access. The large latency arises partly due to the increased distance between the processor and remote memory over the interconnection network. In this paper, we develop a new switch architecture, called Switch MSHR (SMSHR), which provides t...

متن کامل

Impact of Switch Design on the Application Performance of Cache-Coherent Multiprocessors

In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole routing and cut-through switching are evaluated for these shared-memory multiprocessors that employ multistage interconnection network (MIN) and full map directory-based cache coherence protocol. The switch design also con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2000